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Abstract 

Gossip algorithms have recently received significant attention, mainly because they constitute simple and robust 
message-passing schemes for distributed information processing over networks. However for many topologies that 
are realistic for wireless ad-hoc and sensor networks (like grids and random geometric graphs), the standard nearest- 
neighbor gossip converges as slowly as flooding (O(n^) messages). 

A recently proposed algorithm called geographic gossip improves gossip efficiency by a ^Jn factor, by exploiting 
geographic information to enable multi-hop long distance communications. In this paper we prove that a variation 
of geographic gossip that averages along routed paths, improves efficiency by an additional \Jn factor and is order 
optimal (0(n) messages) for grids and random geometric graphs. 

We develop a general technique (travel agency method) based on Markov chain mixing time inequalities, which can 
give bounds on the performance of randomized message-passing algorithms operating over various graph topologies. 

I. Introduction 

Gossip algorithms are distributed message-passing schemes designed to disseminate and process information 
over networks. They have received significant interest because the problem of computing a global function of data 
distributively over a network, using only localized message-passing, is fundamental for numerous applications. 

These problems and their connections to mixing rates of Markov chains have been extensively studied starting 
with the pioneering work of Tsitsiklis [26]. Earlier work studied mostly deterministic protocols, known as average 
consensus algorithms, in which each node communicates with each of its neighbors in every round. More recent work 
(e.g. [12], [2]) has focused on so-called gossip algorithms, a class of randomized algorithms that solve the averaging 
problem by computing a sequence of randomly selected pairwise averages. Gossip and consensus algorithms have 
been the focus of renewed interest over the past several years [12], [3], [14], motivated by applications in sensor 
networks and distributed control systems. 
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The simplest setup is the following: n nodes are placed on a graph whose edges correspond to rehable com- 
munication links. Each node is initially given a scalar (which could correspond to some sensor measurement like 
temperature) and we are interested in solving the distributed averaging problem: namely, to find a distributed 
message-passing algorithm by which all nodes can compute the average of all n scalars. A scheme that computes 
the average can easily be modified to compute any linear function (projection) of the measurements as well as more 
general functions. Furthermore, the scalars can be replaced with vectors and generaUzed to address problems Uke 
distributed filtering and optimization as well as distributed detection in sensor networks [24], [27 J, [20]. Random 
projections computed via gossip, can be used for compressive sensing of sensor measurements and field estimation 
as proposed in [19]. Note that throughout this paper we will be interested in gossip algorithms that compute Unear 
functions, and will not discuss related problems Uke information dissemination (see e.g. [15], [21] and references 
therein). 

Gossip algorithms solve the averaging problem by first having each node randomly pick one of their one-hop 
neighbors and iteratively compute pairwise averages: Initially all the nodes start with their own measurement as an 
estimate of the average. They update this estimate with a pairwise average of current estimates with a randomly 
selected neighbor, at each gossip round. An attractive property of gossip is that no coordination is required for the 
gossip algorithm to converge to the global average when the graph is cormected - nodes can just randomly wake 
up, select one of their one-hop neighbors randomly, exchange estimates and update their estimate with the average. 
We win refer to this algorithm as standard or nearest-neighbor gossip. 

A fundamental issue is the performance analysis of such algorithms, namely the communication (number of 
messages passed between one-hop neighboring nodes) required before a gossip algorithm converges to a sufficiently 
accurate estimate. For energy-constrained sensor network applications, communication corresponds to energy con- 
sumption and therefore should be minimized. Clearly, the convergence time will depend on the graph cormectivity, 
and we expect well-cormected graphs to spread information faster and hence to require fewer messages to converge. 

This question was first analyzed for the complete graph in [12], where it was shown that Q{n\oge^^) gossip 
messages need to be exchanged to converge to the global average within e accuracy. Boyd et al. [3] analyzed the 
convergence time of standard gossip for any graph and showed that it is closely linked to the mixing time of a 
Markov chain defined on the conmiunication graph. They further addressed the problem of optimizing the neighbor 
selection probabiUties to accelerate convergence. 

For certain types of well connected graphs (including expanders and small world graphs), standard gossip 
converges very quickly, requiring the same number of messages (0(nloge~^)) as the fully connected graph. Note 
that any algorithm that averages n numbers will require fl{n) messages. 

Unfortunately, for random geometric graphs and grids, which are the relevant topologies for large wireless ad-hoc 
and sensor networks, standard gossip is extremely wasteful in terms of communication requirements. For instance, 
even optimized standard gossip algorithms on grids converge very slowly, requiring 0(n^loge~^) messages [3], 
[8]. Observe that this is of the same order as the energy required for every node to flood its estimate to aU other 
nodes. On the contrary, the obvious solution of averaging numbers on a spanning tree and flooding back the average 
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to all the nodes requires only 0{n) messages. Clearly, constructing and maintaining a spanning tree in dynamic 
and ad-hoc networks introduces significant overhead and complexity, but a quadratic number of messages is a high 
price to pay for fault tolerance. 

Recently Dimakis et al. [8] proposed geographic gossip, an alternative gossip scheme that reduces to 8(?i^'^ log e^^/-\/logn) 
the number of required messages, with slightly more complexity at the nodes. Assuming that the nodes have 
knowledge of their geographic location and under some assumptions in the network topology, greedy geographic 
routing can be used to build an overlay network where any pair of nodes can communicate. The overlay network 
is a complete graph on which standard gossip converges with 0(nloge^^) iterations. At each iteration we perform 
greedy routing, which costs Q{yjn/ logn) messages on a geometric random graph. In total, geographic gossip thus 
requires 6(71^-^ loge~^/Vlogn) messages. 

Li and Dai [13] recently proposed Location- Aided Distributed Averaging (LADA), a scheme that uses partial 
locations and markov chain lifting to create fast gossiping algorithms. The cluster-based LADA algorithm performs 
slightly better than geographic gossip, requiring Q[n^ '^ log e^^/ (log n)^"^) messages for random geometric graphs. 
While the theoretical machinery is different, LADA algorithms also use directionality to accelerate gossip, but can 
operate even with partial location information and have smaller total delay compared to geographic gossip, at the 
cost of a somewhat more compUcated algorithm. 

This paper: We investigate the performance of path averaging, which is the same algorithm as geographic gossip 
with the additional modification of averaging all the nodes on the routed paths. Observe that averaging the whole 
route comes almost for free in multihop communication, because a packet can accumulate the sum and the number 
of nodes visited, compute the average when it reaches its final destination and foUow the same route backwards to 
disseminate the average to all the nodes along this route. 

In path averaging, the selection of the routed path (and hence the routing algorithm) will affect the performance 
of the algorithm. We start this paper by experimentally observing that the number of messages for grids and random 
geometric graphs seems to scale linearly when random greedy routing is used. 

The mathematical analysis of path averaging with greedy routing is highly complex because the number of possible 
routes increases exponentially in the number of nodes. To make the analysis tractable we make two simplifications: 
a) We eliminate edge effects by assuming a grid or random geometric graph on a torus b) we use box-greedy 
routing, a scheme very similar to greedy routing with the extra restriction that each hop is guaranteed to be within 
a virtual box that is not too close or too far from the existing node. Box-greedy routing (described in section IIII-Db 
can be implemented in a distributed way if each node knows its location, the location of its one-hope neighbors, 
and the total number of nodes n. We call path averaging with box-greedy routing Box-path averaging. 

The main result of this paper is that geographic gossip with path averaging requires 0{n) messages under these 
assumptions. Further, we present experimental evidence that suggests that this optimal behavior is preserved even 
when different routing algorithms are used. 

The remainder of this paper is organized as follows: in Section HI] we define our time and network models, give 
a precise definition of gossip algorithms and explain our metrics to evaluate the performance of gossip algorithms. 
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In Section |lll] we describe path averaging with greedy routing and show its excellent performance in simulations. 
We also define path averaging with box-greedy routing (box-path averaging), whose analysis is tractable and gives 
insight on general gossip algorithms. In Section |IV] we present the technical tools we use to theoretically show the 
efficiency of box-path averaging. We show that the methodology developed in that section is general, simple and 
insightful. Section HV-DI states our results, and outlines the proofs which can be found in the Appendix. 

II. Background and Metrics 

A. Time model 

We use the asynchronous time model [1], [3], which is well-matched to the distributed nature of sensor networks. 
In particular, we assume that each sensor has an independent clock whose "ticks" are distributed as a rate A Poisson 
process. However, our analysis is based on measuring time in terms of the number of ticks of an equivalent single 
virtual global clock ticking according to a rate n\ Poisson process. An exact analysis of the time model can be 
found in [3]. We will refer to the time between two consecutive clock ticks as one timeslot. 

Throughout this paper we will be interested in minimizing the number of messages without worrying about delay. 
We can therefore adjust the length of the timeslots relative to the communication time so that only one packet exists in 
the network at each timeslot with high probability. Note that this assumption is made only for analytical convenience; 
in a practical implementation, several packets might co-exist in the network, but the associated congestion control 
issues are beyond the scope of this work. 
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B. Network model 

We model the wireless networks as random geometric graphs (RGG), following standard modeling assump- 
tions [11], [18]. A random geometric graph G{n,r) is formed by choosing n node locations uniformly and 
independently in the unit square, with any pair of nodes i and j connected if their Euclidean distance is smaller 
than some transmission radius r (see Fig. [Til. It is well known [18], [11], [10] that in order to maintain connectivity 
and to minimize interference, the transmission radius r{n) should scale like r(n) = -y/ c log_n / n. For the purposes 
of analysis, we assume that communication within this transmission radius always succeedo Note that we assume 
that the messages involve real numbers; the effects of message quantization in gossip and consensus algorithms, is 
an active area of research (see for example [17], [25]). 

In the Appendix we show a slightly stronger condition than connectivity, on how the scaling coefficient c in r{n) 
tunes the regularity of random geometric graphs. The result states that, if c > 10, then a random geometric graph is 
regular with high probability when n is large. Regular geometric graphs are random geometric graphs with degrees 
bounded above and below. In particular, select constants a < a < b, draw a random geometric graph and divide the 
unit square in squares of size alogn/n. If each square contains between alogn and blogn nodes, then the graph 
is called regular. One standard result [11], [18] that for a suitable constant a, each of these squares will contain 
one or more nodes with high probability (w.h.p.). In the appendix we prove a slightly stronger regularity condition: 
that in fact, if a > 2, the number of nodes in each square will be Q{\ogn) nodes, i.e. the random geometric graphs 
are regular geometric graphs w.h.p. In Section IIII-DI we assume that our network is a regular geometric graph 
embedded on a torus, and we ensure that any node in a square is able to communicate with any other node of its 
four neighboring squares by setting c > 10. 

C. Gossip algorithms 

Gossip is a class of distributed averaging algorithms, where average consensus can be reached up to any desired 
level of accuracy by iteratively averaging small random groups of estimates. At time-slot t = 0, 1, 2, . . ., each node 
i — 1, . . . ,n has an estimate Xi{t) of the global average. We use x{t) to denote the n-vector of these estimates and 
therefore a::(0) gathers the initial values to be averaged. The ultimate goal is to drive the estimate x{t) to the vector 
of averages Xavcl, where Xavc n^7=i-'^i(^)' ^^'^ ^ ™ n-vector of ones. In gossip, at each time-slot t, a 
random set S{t) of nodes communicate with each other and update their estimates to the average of the estimates 
of S{t): for all j E S{t), Xj{t+ 1) = J2ies{t) ^ii^) standard gossip (nearest neighbor) and in geographic 
gossip, only random pairs of nodes average their estimates, hence S{t) always contains exactly two nodes. On the 
other hand, in path averaging, S{t) is the set of nodes in the random route generated at each time-slot t. Therefore 
in this case, S{t) contains a random number of nodes. 

'However, we note that our proposed algorithm remains robust to communication and node failures. 
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Fig. 1. Random geometric graph example. Tlie connectivity radius is r(n). 



D. Metrics for convergence time and message cost 

We measure the performance of gossip algorithms with a metric that was recently introduced in [6]. Instead of 
defining convergence time as the time Tave elapsed until the error metric becomes smaller than e with probability 
1 — e (see Eq. ^) as in [3], we define it as the time Tc by which the error metric is divided by an e factor with 
probability 1 in the long run. Apart from giving an almost sure criterion for convergence time, consensus time Tc 
also conveniently lightens the formalism by removing the e's. 

For the algorithms of interest, the estimate vector x{t) and the error vector e{t) = x{t) — Savol for t > are 
random. However, in the long run, the error decays exponentially with a deterministic rate 1/Tc, where Tc, called 
consensus time, is defined as follows [6]: 

Theorem 1: Consensus time Tc. If {>S'(i)}(j.o is an independently and identically distributed (i.i.d.) process, then 
the limit 

-^ = /_m ^log||e(t)||, (1) 
where || • |j denotes the I2 norm, exists and is a constant with probability 1. 

In other words, after a transient regime, the number of iterations needed to reduce the error ||£|1 by a factor e is 
almost surely equal to Tc, which therefore characterizes the speed of convergence of the algorithm. Tc is easy to 
measure in experiments, and can be theoretically upper bounded. However lower bounding this quantity remains 
an open problem. 

Previous work defined the e-averaging time Tave{^), another quantity describing speed of convergence [3] (see 
also [9] for a related analysis): 

Definition 1: e-averaging time Tave{f)- Given e > 0, the e-averaging time is the earliest time at which the vector 
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x{k) is e close to the normalized true average with probability greater than 1 — e: 




(2) 



Although Tai,e(e) is hard to measure in practice because it requires the evaluation of an infinite number of 
probabilities, it is easily upper and lower bounded theoretically in terms of the spectral gap (see Section lIVI l. 
Indeed Taue(e) contains a probability tolerance e in its definition, which facilitates greatly its analysis. On the 
contrary, Tc is hard to analyze theoretically because it is constrained by the exigency of its inherent determinism. 
An important issue is the behavior of Tc and Tave as the number n of nodes in the network grows. It can be 
shown that Tc{n) = 0{Tave{n, e)) for any fixed e, but whether the two quantities are equivalent and under which 
conditions is still an open problem. Previous theoretical results summarized in Table U refer to e-averaging time. 

We compare algorithms in terms of the amount of required communication. More specifically, let R{t) represent 
the number of one-hop radio transmissions required in time-slot t. In a standard gossip protocol, the quantity 
R{t) = i? is simply a constant, whereas for our protocol, will be a sequence of i.i.d. random variables. 

The total communication cost at time-slot t, measured in one-hop transmissions, is given by the random variable 
C{t) — J2k=i Consensus cost Cc is defined as follows [6]: 

Theorem 2: Consensus cost Cc- If {S{t)}t^o is an independently and identically distributed (i.i.d.) process, then 
the following limit exists and is a constant with probability 1: 



Thus, Cc = E[_R(l)]Tc is the number of one-hop transmissions needed in the long run to reduce the error by a 
factor e with probability 1. 



Similarly, we define the expected e-averaging cost Cave{^) to be the expected communication cost in the first Tave{() 
iterations of the algorithm: C^U^) = E[C(T,,e(e))] = E[i?(l)]T,„e(e). 



A. Path averaging on random geometric graphs. 

The proposed algorithm combines gossip with random greedy geographic routing. A key assumption is that each 
node knows its location and is able to learn the geographic locations of its one-hop neighbors (for example using 
a single transmission per node). Also the nodes need to know the size of the space they are embedded in. Note 
that while our results are developped for random geometric topologies, the algorithm can be apphed on any set of 
nodes embedded on some compact and convex region. 

The algorithm operates as follows: at each time-slot one random node activates and selects a random position 
(target) on the unit square region where the nodes are spread out. Note that no node needs to be located on the 
target, since this would require global knowledge of locations. The node then creates a packet that contains its 
current estimate of the average, its position, the number of visited nodes so far (one), the target location, and passes 
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Fig. 2. Random greedy routing. Node i lias to choose the following node in the route among the nodes that are his neighbors (inside the ball 
of radius r(n) centered in node i) and that are closer to the target than i (inside the ball of radius centered in the target, where d is the distance 
between node i and the target). Next node is thus randomly chosen in the intersection of the two balls. 

the packet to a neighbor that is randomly chosen among its neighbors closer to the target. As nodes receive the 
packet, randomly and greedily forwarding it towards the target, they add their value to the sum and increase the hop 
counter. When the packet reaches its destination node (the first node whose nearest neighbors have larger distance 
to the target compared to it), the destination node computes the average of all the nodes on the path, and reroutes 
that information backwards on the same route. See Fig. |2] for an illustration of random greedy routing. It is not 
hard to show [8] that for G{n, r) when r scales like 6(-y/logn/n), greedy forwarding succeeds to reach the closest 
node to the random target with high probability over graphs — in other words there are no large 'holes' in the 
network. We will refer to this whole procedure of routing a message and averaging on a random path as one gossip 
round which lasts for one time-slot, after which 0{^n/ log n) nodes will replace their estimates with their joint 
average. We prefer not to route the estimates by choosing the next node as the closest neighbor to the target, but 
as one random neighbor closer to the target, because we observed that the latter is cheaper (smaller (7c). Note that 
the nodes do not need to know the number of nodes n in the network, they only need the size of the field on which 
they are deployed. 

B. Motivation-Performance simulations 

We experimentally measured Tc and Cc in order to evaluate the performance of path averaging on random 
geometric graphs with a growing number n of nodes in the unit square. Fig |3(b)| shows that our algorithm behaves 
strikingly better than standard gossip and geographic gossip, when, for example, r{n) — c log njn with c = 4.5. 
For other values of c, the performance of our algorithm also greatly improves previous gossip schemes. Most 
importantly, for small connection radius r(n) (small c), the number of messages Cc behaves almost linearly in n 
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Fig. 3. Performance of path averaging. The simulations were performed over 15 graphs per n. Averaging time was measured here by Tc 



(ti — (2)/ [log ||e(t2)|| ^ log for t\ = 500 and t2 = 1750. (a) The mean route length in random greedy routing behaves in ^n/ log n. 

(b) Comparison between standard gossip, geographic gossip (without rejection sampling) and path averaging with r(n) = ^ 4.5 log n/n. (c), 
(d) Consensus costs Cc = E[H]Tc for radii r(n) = ^4.51ogn/n and r(n) = -^25 log n/n. 



(see Fig. |3(c)| l, and as c increases, the behavior improves (see Fig. |3(d)| l. The shght super-linearity in Fig |3(c)| is 
due to small r{n) and possibly edge effects. Clearly, we cannot expect better than linear behavior in n because 
at least n messages are necessary to average n values. Therefore path averaging with greedy routing seems to be 
optimal for sufficiently large constant c. 

Unfortunately, the theoretical analysis of path averaging with greedy routing seems intractable. However, with a 
slight modification in the routing algorithm, and by ignoring edge effects, we are able to analyze path averaging, 
first for grids and then for regular geometric graphs. Recall that random geometric graphs are regular geometric 
graphs with high probability when n large if c is sufficiently large (Section III-Bl i. 

C. (<-!■, \)-path averaging on grids 

The first step in our analysis is understanding the behavior of path averaging on regular grids using a simple 
routing scheme. Throughout this paper, a grid of n nodes will be a 4-connected lattice on a torus of size ^Jn x -^n. 
I) -path averaging performs as follows: At each iteration t, a randomly selected node / wakes up and selects 
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(a) (^,|)-route (b) box-path averaging 



Fig. 4. (a) Shortest (<->, J)-route from / to J on the grid, (b) Example of box-path averaging on an RGG: The node with inital value 3 selects 
a random position and places a target. Using (<->, J)-box routing towards that target, all the nodes on the path replace their values with the 
average of the four nodes. 



a random destination node J so that the pair (/, ,/) is independently and uniformly distributed. Node / also flips 
a fair coin to design the first direction: horizontal (^) or vertical (|). If for instance horizontal was picked as the 
first direction, the path between / and J is then defined by the shortest horizontal-vertical route between / and J 
(see Fig. |4(a)| i. The estimates of all the nodes on this path are aggregated and averaged by messages passed on this 
path, and at the end of the iteration the estimates of the nodes on this path are updated to their global average. 
Clearly, this message-passing procedure can be executed if each node knows its location on the grid. 

D. Box-path averaging on regular geometric graphs 

As seen in Section III-BI a regular geometric graph can be organized in virtual squares with the transmission 
radius r{n) selected so that a node can pass messages to any node in the four squares adjacent to its own square. 

In box-path averaging, when a node activates, it chooses uniformly at random a target location in the unit torus 
and its initial direction: horizontal or vertical. Then a node is selected uniformly from the ones in the adjacent 
square in the right direction. (Recall that regularity ensures that w.h.p. 0(logn) nodes will be in each square.) The 
routing stops when the message reaches a node in the square where the target is located. As in the previous path 
averaging algorithms, the estimates of all the nodes on the path are averaged and all the nodes replace their values 
with this estimate (see Fig. |4(b)| l. The key point is that box-path averaging can be executed if each node knows 
its location, the locations of its one-hop neighbors and the total number of nodes n, because with this knowledge 
each node can figure out which square it belongs to and pass messages appropriately. 
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Fig. 5. Choosing next node in the route. On the left: random greedy routing, on the right: (J, ^)-box routing. It is easy to see that the two 
choice areas contain on average 0(log7i) nodes. 

Box-greedy routing is a regularized version of random greedy routing, and is introduced to make the analysis 
tractable. Both routing schemes proceed by choosing the next hop among 0(logrt) nodes (Fig.|5]l. Box-greedy rout- 
ing generates routes with Q{^n/ logn) hops on average, and random greedy routing does as well on experiments 
(Fig. |3(a)| l. We are now ready to start the theoretical analysis of the aforementioned path averaging algorithms. 

IV. Analysis 

A. Averaging and eigenvalues. 

Let x{t) denote the vector of estimates of the global averages after the t*"^ gossip round, where x{Q) is the vector 
of initial measurements. Any gossip algorithm can be described by an equation of the form 

x{t + l)^W{t)x{t), (3) 

where W{t) is the averaging matrix over the t*'* time-slot. 

We say that the algorithm converges almost surely (a.s.) if P[limt^oo — a^auel] = 1- It converges in 
expectation if limt^oo IE[a;(t)—a;ai,el] = 0, and there is mean square convergence if linif^oo IE[||a;(t) — a;auel||2] = 0. 
There are two necessary conditions for convergence; 

l^VF(i) = 1^ 

(4) 

which respectively ensure that the average is preserved at every iteration, and that 1 is a fixed point. For any linear 
distributed averaging algorithm following ^ where {VF(i)}t>o is i.i.d., conditions for convergence in expectation 
and in mean square can be found in [2]. In gossip algorithms, W{t) are symmetric and projection matrices. Taking 
into account this particularity, we can state specific conditions for convergence. Let A2(E[M^]) be the second largest 
eigenvalue in magnitude of the expectation of the averaging matrix E[VK] = E[M^(t)]. If condition ^ holds and if 
A2(E[W]) < 1, then x{t) converges to Xave^ in expectation and in mean square. 

In the case where {T4^(t)}t>o is stationary and ergodic (and thus in particular when {W{t)}t>o is i.i.d.), sufficient 
conditions for a.s. convergence can be proven [5]: if the gossip communication network is connected, then the 
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estimates of gossip converge to the global average Save with probability 1. More precisely, define := mi{t > 
1 : 11^=0 — P) ^ V > 0}- Tr, is a stopping time. If IE[T^] < oo, then the estimates converge to the global 
average with probability 1. In other words, every node has to eventually connect to the network, which has to be 
jointly connected. 

Interestingly, the value of A2(E[VF]), that appears in the criteria of convergence in expectation and of mean 
square convergence, controls the speed of convergence: 

A straightforward extension of the proof of Boyd et al. [3] from the case of pairwise averaging matrices to the 
case of symmetric projection averaging matrices yields the following bound on the e-averaging time, which also 
involves A2(E[VF]): 

There is also a lower bound of the same order, which impUes that Tawe(e,E[VF]) = 6(loge~^/(l — A2(]E[W^]))). 

Consequently, the rate at which the spectral gap 1 — A2(E[W^]) approaches zero as n increases, controls both 
the e-averaging time Tave and the consensus time Tc. For example, in the case of a complete graph and uniform 
pairwise gossiping, one can show that A2(E[W]) = 1 — 1/n. Therefore, as previously mentioned, the consensus 
time of this scheme is 0{n). In pairwise gossiping, the convergence time and the number of messages have the 
same order because there is a constant number R of transmissions per time-slot. In geographic gossip and in path 



averaging on random geometric graphs, one round uses many messages for the path routing {■\/n/ logn messages 
on average), hence multiplying the order of consensus time Tc{n) by -^/n/ logn gives the order of consensus cost 
Coin). 

B. The travel agency method 

A direct consequence of the previous section is that the evaluation of consensus time requires an accurate upper 
bound on A2(E[l'K]). Consequently, computing the averaging time of a scheme takes two steps: (1) evaluation 
of E[Vr], (2) upperbound of its second largest eigenvalue in magnitude. E[W] is a doubly stochastic matrix that 
corresponds to a time-reversible Markov Chain. 

We can therefore use techniques developed for bounding the spectral gap of Markov Chains to bound the 
convergence time of gossip. In particular, we will use Poincare's inequality by Diaconis and Stroock [7] (see 
also [4], p. 212-213 and the related canonical paths technique [23]) to develop a bounding technique for gossip. 

Theorem 3 (Poincare's inequality [7]): Let P denote an n x n irreducible and reversible stochastic matrix, and 
TT its left eigenvector associated to the eigenvalue 1 (tt^P = tt^) such that J2^=i = 1- A pair e = {k,l) is 
called an edge if Pki ^ 0. For each ordered pair (i, j) where l^i,j^n, iy^j, choose one and only one path 
lij = {i, h, - ■ ■ , imij) between i and j such that {i, ii), 12), ■ ■ -, {imij) are all edges. Define 

11 1 

\iij\ = —rr5 — '^~rT5 — + •••+ /■ \u — ■ ^'^'> 
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The Poincare coefficient is defined as 



K — max 

edge e 



(8) 



Then the second largest eigenvalue of P verifies 



A2(P) < 1- -. 



(9) 



We will apply this theorem with P — E[W]. Here 7r(i) 1/n for all 1 ^ i ^ n. 

The combination of Poincare inequality with bounds |5] and |6] forms a versatile technique for bounding the 
performance of gossip algorithms that we call the travel agency method. It is crucial to understand that the edges 
used in the application of the theorem are abstract and do not correspond to actual edges in the physical network. 
They instead correspond to paths on which there is joint averaging, and hence information flow, through message- 
passing. Consider the following analogy. Imagine that n airports are positioned at the locations of the nodes of 
the network. In this scenario, we are given a table P = E,[W] of the flight capacities (number of passengers per 
time unit) between any pair of airports among the n airports. A good averaging intensity E,[Wij] between nodes 
i and j correspond to a good capacity flight between airports i and j in the travel agency method. Here edges e 
are existing flights and, in our specific case, there is the same number of travelers in all the airports (7r(i) = 1/n 
for all i). We are asked to design one and only one road map 7^ between each pair of airports i and j that avoids 
congestion and multiple hops. [7^ | measures the level of congestion between airport i and airport j. The theorem 
tells us that if we can come up with a road map that avoids significant congestion on the worst flight (i.e. if k is 
small), then we will have proven that the flying network is efficient (A2 is small). The previous bounds I5I6I can 
now be used to bound the consensus time and consensus cost. 

One of the important benefits of this bounding technique is that we do not need know the entries of E[T4^] to 
bound the averaging cost, and only good lower bounds suffice. In terms of the analogy, we only need to know 
that each flight has at least capacity Cij. If can actually carry more passengers (Pi j ^ Ci j), then our 
measure of congestion k will be overestimated. While our final upper-bounds will not be as tight as they could 
have been if we had exact knowledge of E[W], they suffice to establish the optimal asymptotic behavior. 

C. Example: standard gossip revisited 

In order to illustrate the generality of our technique, we show how to apply it on simple examples, by giving 
sketches of novel proofs for known results on nearest neighbors gossip on the complete graph and on the random 
geometric graph. 

1) Complete graph: For any i 7^ j, E[T4^y] ~ Indeed Wij = 0.5 when node i wakes up (event of 

probability 1/n) and chooses node j (event of probability 1/n as well), or when j wakes up and chooses i. We 
apply now the travel agency method. We see in E[W] that all flights have equal capacity and that there are 
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direct flights between any pair of airports. We choose here the simplest road map one could think of: to go from 
airport i to airport j, each traveller should take the direct hop 7^ = Then the sum in ^ has only one term: 

llijl — n^- In this case all flights are equal and one flight e — {i,j) belongs only to one road map: 7^. Thus the 
sum in dS) also has only one term and n = n? / [n ■ n) ~ n. Therefore A2(E[M^]) ^ 1 — which proves that 
Tc{n) — 0{n). Note that the complete graph is the overlay network of geographic gossip (every pair of node can 
be averaged at the expense of routing), which thus performs in Cc{n) = 0{n^JTiJ\ogn) . 

2) Random geometric graph (RGG): We show in the Appendix that if the connection radius r{n) is large enough, 
then RGGs are regular with high probability, i.e. the nodes are very regularly spread out in the unit square, which 
implies that each node has 6(logn) neighbors. To keep the illustration of the travel agency method simple, we 
assume that the nodes lie on a torus (no border effects). Consider the pair of nodes If i and j are not 

neighbors, then E[Wij] — 0; if « and j are neighbors, then E[VFij] — 8 (l/(7ilog7i)) because node i wakes up with 
probability 1/n and chooses node j with probability 8 (1/logn). We now have to create a roadmap with only 
short distance paths. Regularity ensures that there are no isolated nodes that could create local congestion. We thus 
naturally decide that the best way to go is to select paths along the straightest possible line between the departure 



airport and the destination airport. This will require 0{y^n/ logn) hops, therefore the right hand side of Equation 



d?) is the sum of 0{^/n/\ogn) terms, each of equal order: 



1 „ / 1 



l7^,l=OL/^ — 8 -— )^Oin^yM^). (10) 

\\jiognJ 1/n \l/n\ognJ 

Now we need to compute in how many paths each particular flight is used. It follows from our regularity and torus 
assumptions that each flight appears in approximatively the same number of road maps. There are paths that use 



0{y^n/ logn) flights, but there are only Q{n\ogn) different flights, hence each flight is used in O {in/ logri)^'^) 
paths. We can now compute the Poincare coefficient k. We drop the maxg argument in Equation (|8]l because all 
flights are equal. As 7r(i) = 7r(j) = 

K = V 0(nV"logn)-- (11) 
^-^ n n 



= 0[i-^y')Oiy^n[^) (12) 
V logn J 

9 

= 0{- ), (13) 

logn 

which proves that Tc{n) = 0(n^/ logn). 

3) Comments: The proof of the performance of path averaging on a RGG given in Section |B] gives insight on 
how to complete this last proof. It is interesting to see that the travel agency method describes how information 
will diffuse in the network. In the second example, far away nodes will never directly average their estimates, but 
they will do it indirectly, using the nodes between them. 

Note that our method does not give lower-bounds on A2(E[W^]), which would be useful to give an equivalent 
order for e-averaging time Tave- In the case of path averaging, this is not an issue since it is not possible to achieve 

^In reality, geographic gossip will not be completely uniform but rejection sampling can be used [8] to tamper the distribution 
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Fig. 6. Behavior of E[VFij] as a function of the distance in norm 1 between i and j for standard gossip, geographic gossip and box-path 



better than the consensus cost Cdn) = 8(n). So if the method shows that Tc{n) — 0{^/nlogn), we have that 
Ccin) = 0{-^n\ogn)0{^Jn/ logn) = 0{n) and we can conclude that Cc{n) — Q{n). 

D. Main Results 

The main resuhs of this paper is that the consensus cost of |)-path averaging on grids and of box-path 
averaging on random geometric graphs, behave linearly in the number of nodes n: 

Theorem 4 ({^, \)-path averaging on grids): On a -/n x ^/n torus grid, the consensus time Tc{n) of |)- 
path averaging, described in Section ITlI-CI is 0{^/n). Furthermore, the consensus cost is hnear: Cc{n) = 0{n). 

Theorem 5 (Box-path averaging on RGG): Consider a random geometric graph G{n, r) on the unit torus with 



r{n) = J ^^-^p^, c > 10. With high probability over graphs, the consensus time Tc(n) of box-path averaging. 



described in Section Ull-Dl is 0{^n logn). Furthermore, the consensus cost is lineai": Cc{n) = 0{n). 

The proofs of Theorem|4]and Theorem|5]are given in the Appendix. Both proofs have the same structure: we first 
lower bound the entries of E[W] and next upper bound its second largest eigenvalue in magnitude. Figure |6] shows 
the behavior of E[VFij] as a function of the Li distance between nodes i and j for standard gossip, geographic gossip 
and path averaging; respectively the proofs give us the insight behind the good performance of box-path averaging 
compared to standard gossip and geographic gossip by simply analysing Fig. |6l Box-path averaging concentrates 
the averaging intensities E[Wij] of node i in the area of nodes j close to i. Indeed, the closer two nodes, the higher 
the probability that they are on the same route. Thus, as we can observe on Fig. |6] close nodes have a much higher 
averaging intensity E[Wy] than in geographic gossip, where nodes are equally rarely averaged together (the proof 
shows an order y^n/ logn higher). However, the averaging intensity gained by close nodes is lost for far away 
nodes, which do not average together well anymore (a factor n loss compared to geographic gossip). 



averaging. 
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In terms of the travel agency method, in box-path averaging over the unit area torus, flights with that cover 
distances shorter than 1/2 have high capacity, whereas long distance flights are rare. To apply the method, the idea 
is to chose 2-hop paths: to go from node i to node j, the path will contain two hops that stop half way, in order 
to exclusively and fairly use the high capacity flights. Remember that standard gossip needs ^Jn/ logn flights per 
path (see Section HV-C.lb . which heavily penalizes the performance despite a very high averaging intensity E[Wij] 
for neighboring nodes i and j (see Fig. |6] where ]E[W^ij] is large for neighboring nodes but falls to for distances 
larger than r{n)). The performance of path averaging algorithms is good thanks to a diffusion scheme requiring 
only 0(1) flights in each path and 0(1) uses of each flight in the road map, combined with a high enough level 
of averaging intensity E[Wi_;]. Each node can act as a diffusion relay for some far away nodes, so that the whole 
network can benefit from the concentration of the averaging intensity. 

As a summary, in contrast with geographic gossip, path averaging and standard gossip concentrate their averaging 
intensity on close nodes, which leads to larger coefficients E[Wi j] when nodes i and j are close enough. However, 
while standard gossip pays for its concentration with long paths overusing every existing flight, the diffusion pattern 
of path averaging operates in 2 steps only without creating any congestion (more precisely, we compute in the proof 
that each flight is used in at most 9 paths). In conclusion, the analysis shows that path averaging achieves a good 
tradeoff between promoting local averaging to increase averaging intensity (large E[IVij]) and favoring long distance 
averaging to get an efficient diffusion pattern (every path contains only 0(1) edges, and every edge e appears 
in only 0(1) paths). 

V. Conclusions 

We introduced a novel gossip algorithm for distributed averaging. The proposed algorithm operates in a distributed 
and asynchronous manner on locally connected graphs and requires an order-optimal number of communicated 
messages for random geometric graph and grid topologies. The execution of path averaging requires that each node 
knows its own location, the locations of its nearest-hop neighbors and (for the routing- scheme that was theoretically 
analyzed) the total number of nodes n. 

Location information is independently useful and likely to exist in many application scenarios. The key idea that 
makes path averaging so efficient is the opportunistic combination of routing and averaging. The issues of delay 
(how several paths can be concurrently averaged in the network) and fault tolerance (robustness and recovery in 
failures) remain as interesting future work. 

More generally, we believe that the idea of greedily routing towards a randomly pre-selected target (and processing 
information on the routed paths) is a very useful primitive for designing message-passing algorithms on networks 
that have some geometry. The reason is that the target introduces some directionality in the scheduling of message 
passing which avoids diffusive behavior. Other than computing linear functions, such path-processing algorithms 
can be designed for information dissemination or more general message passing computations such as marginal 
computations or MAP estimates for probabilistic graphical models [22]. Scheduling the message-passing using some 
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form of linear paths can accelerate the communication required for the convergence of such algorithms. We plan 
to investigate such protocols in future work. 
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VI. Definitions 

A. Notation 

• G{n, r) or RGG: random geometric graph with n nodes and connection radius r. 

• x(0): vector of the initial values to be averaged. 

• iavo = Z]fc=l Xk{Q)ln. 

• x{t): vector of the estimates of the average. 

• S{t): the random set of nodes that average together at time-slot t. 

• R{t): number of one hop transmissions at time-slot t. 

• e{t) = x{t) ~ a^avcl: error vector, where 1 is the vector of all ones. 

• W{t): averaging matrix at time t. 

• A2: second largest eigenvalue in magnitude. 

• 7ij: path starting in i and ending in j. 

• |7ij| measures the "resistance" of path 7,,, (Eq. (O). 

• k: Poincare coefficient (Eq. 

• Tave{()'- e-averaging time (Def. |2]i 

• Cave{^) = '^[R{^)]Tave'- cxpcctcd e-avcraging cost. 

• Tc, Cc- consensus time, consensus cost (Def. [T] |2]l. 

B. List of the algorithms 

• Standard gossip: pairwise gossip where only direct neighbors can average their estimates together. 

• Geographic gossip: pairwise gossip where any pair of nodes can average their estimates together at the expense 
of routing. 

• Path averaging: at each iteration a random route is created by random greedy routing in an RGG. The nodes 
of the route average their estimates together. 

• |)-path averaging: at each iteration a random route is created by |)-routing on a grid (embedded on 
a torus in the analysis). The nodes of the route average their estimates together. 

• Box-path routing: at each iteration a random route is created by box-routing on a regular geometric graph 
(embedded on a torus in the analysis). The nodes of the route average their estimates together. 
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Appendix 

A. Performance of \)-path averaging on a grid 

This section prooves Theorem|4] which states the hnearity of consensus cost for |)-path averaging on a grid. 
The analyzed algorithm is described in Section IIII-CI 

1) Notation: We need to define the shortest distance on a torus. To this end, we introduce a torus absolute value 
\.\r and a torus Li norm ||.||i. For any algebraic value a: on a one dimensional torus (circle with ^Jn nodes) and 
any vector i on a ^Jn x ^Jn two dimensional torus, 

\x\r = m\n{\x\,\x - -Jn\,\x + \/n\) 

Ml = \ix\T+\iy\T- 

We call = \\i — the Li distance between nodes i and j. The shortest routes between / and J have 
a — ijj + 1 = \Jx — IxW + \Jy — lyW + 1 nodes to be averaged, thus the non-zero coefficients of their 
corresponding matrices W are all equal to \/a. 

To each route r, we assign a generalized gossip n x n matrix W^'^'' that averages the current estimates of the 
nodes on the route. Consequently, at iteration t, W{t) — T4^(''(*)), where r{t) was randomly chosen. We call R the 
route random variable, s{R) its starting node, d{R) its destination node, and £{R) — ts{R)d(R) + 1 its number of 
nodes. As we choose the shortest route, the maximum number of nodes a route can contain is ^/n if ^/n is odd, 
^Jn-\- 1 if ^/n is even, which can be written as 2[y^/2J + 1 in short. 

2} Evaluating E[VF].- 

Lemma 1: (Expected ¥,[W] on the grid) For any pair of nodes (i, j), if their distance normalized to the maximum 
distance 5ij = \\i — jlli/V"- is smaller than a constant, then 

nW^A = n f-L) . (14) 



More precisely, 



2(1 - + S.,j log%) 



71 V" 

Therefore, as expected, far away nodes are less likely to be jointly averaged compared to neighboring ones (see 
Figure IS. Proof: Observing that E[W^^^^\{^, |)] = E[T/F(-'^)|(|,^)] because the route from a node / to a 
node J horizontally first has the same nodes as the route from J to I vertically first, we get 

= iE[M/(^)|H,|)] + iE[M^(«)|(|,H] 
= E[W(«)|H,|)]. 

So, for a given pair of nodes we can compute the {i,j)th entry of the matrix expectation K[W] by 

systematically routing first horizontally. Only the (^,|)-routes which contain both these two nodes i and j will 
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Fig. 7. Counting the number of routes of lengtli £ = 9 nodes, in the case where lij = 5. There are £ - 



9 — 5 = 4 possible routes with 



exactly £ nodes going through node i then tlirough node j. We admit only routes going horizontally first then vertically. 



have a non-zero contribution in IE[tyij]. Pick such a route r, the {i,j)th entry of the corresponding averaging matrix 
is W^/j"* = l/£{r). We call TZj^ the set of (<->. |)-routes with £ nodes passing by node i and by node j, and denote 
— max{x, 0). It is not hard to see that {£ — £ij)^ is the number of routes of length £ passing by i first and j 
next (see Fig. |7]l, so {TZfjl = 2{£ — £ij)^ ■ We thus have for any i ^ j: 

r 



- y 



V? ^ £ 

2L^J+l 

2 £ £ij 



from which we can deduce that for i ^ j 



9 /■\Ai+2 



X 



' /n - + 1 - 4,- In ^ 



2 T - £ ■ 



X 



2 



2 ^ V — -^ij — £ij In 



E[Wi j] decreases from ^^^^ to o(4j) as a function of . To get a normalized expression with respect to ^/n, we 
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use the coefficient 6ij defined in the statement of Lemma [T] 

2 




This establishes the claim. In particular, if Sij ~ 1/2, then E[Wij] ~ 



3) Bounding X2{K[W]): We need now to upperbound the second largest eigenvalue in magnitude of E[VF], or 
equivalently, the relaxation time 1/(1 — A2(E[VF])). 
Lemma 2 (Relaxation time): 

^ 0{V^). (15) 



l-A2(E[iy]) 

Proof: The Poincare inequality (Theorem O bounds the second largest eigenvalue of a stochastic matrix and 
not necessarily its second largest eigenvalue in magnitude, which is the important quantity involved in Eq. (|5]). 
It could happen that the smallest negative eigenvalue is larger in magnitude than the second largest eigenvalue. 
Consequently, if we show that all the eigenvalues of E[VF] are positive, then the two eigenvalues coincide and we 
can use the Poincare inequality to bound the second largest eigenvalue in magnitude. E[VF] is symmetric so all its 
eigenvalues are real. The sum of all the entries along the lines of E[VF] without counting the diagonal element is 
0{1/-Jn), whereas the diagonal elements are 6(1), so by Gershgorin bound [4], all the eigenvalues of E[W^] are 
positive. 

We can now use the bounds on E[W^] to bound its spectral gap. 

We want to prove that path averaging performs ^Jn better than geographic gossip, where E[Wi j] — 1 /-n? ( lIV-C.ll l. 
It is encouraging to note that for 5ij ^ 1/2, E[Wij] ^ ' which is precisely better than l/-n?. We thus 

observe that it is possible to find edges with a good capacity with length equal to half of the whole graph. However 
very distant destinations remain problematic. Consider the extreme case of a distance ^/n between two nodes i and 
j. There are only two routes that will jointly average them: the route that goes from i to j, and the reverse one. 
These routes are selected with probability l/n? and Wij = Xj \fn, implying that E[Wij] — 2/ii?-^ <C l/v}'^. 

Formally, for each ordered and distinct pair («, j), we choose a 2-hop path 7^ from i to j stopping by an "airport" 
node k chosen to be located approximatively half way between i and j. To be more precise, we define direction 
functions ax and cjy, where ax{i,j) ~ 1 (respectively, ay{i,j) — 1) if the horizontal (resp., vertical) part of the 
route from i to j goes to the right (resp., up) and (Tx{i,j) = —1 (resp., (jy{i, j) = —1) if it goes left (resp., down). 
The coordinates of k in the torus are: 

kx = (ix + <Jx{i,j)[^-^^^-—^^\] (mod v^) (16) 



ky = (^,^^ + ay{^,J)[\^^L^y^\^ (modV^). 

In the road map 7 we have just constructed, the maximum flight distance is smaller than ^ + 1 in Li distance. 
Therefore, according to Lemma[Tl for any edge e in 7, E[W^e] ^ rj/v}'^, where rj is a non negative constant slightly 
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smaller than 1 — In 2. Thus, for each path we have; 



1 1 



7r(i)E[l¥,,fc] ^(fc)E[VKfc,,] 
1 1 



< (17) 
V 

We can now compute the Poincare coefficient: 

K = max ^ = ^max ^ |7ij|. (18) 

To compute this sum, we need to count the number of paths 7^ in the road map that use a given flight e. In our 
construction, we have balanced the traffic load over all the short flights so that a flight e belongs to at most 8 
paths. Indeed, if a path contains flight e, then e is either the first or second flight. In the first case, by construction, 
the second flight has to be approximatively as long as e. Moreover, because of quantized grid effects, there are 
actually only 4 different possible flights a traveler in flight e might take as second flight (see Fig. [8]). Repeating 
this argument in the case where e is the second flight, we then obtain that a flight e appears in at most 8 paths. 
Combining ( flTl l and ( fTSl ). we get: 

16 r- 

K < ^/n. 

V 

As a result, 

n 

\l < 1 



which yields Lemma |2] The proof is complete by using equation (|5). 

■ 

In the next Section, we generalize this proof from grids to regular geometric graphs. The approach will be the same 
but the detailed computations will be different. Also, the construction of the paths in the travel agency method will 
need some refinement. 



B. Performance of box-path averaging. 

We now prove Theorem |5] All the fundamental ideas coming from the proof on grids in the previous section, 
appear here again, but sometimes in a more technical form. We have k boxes forming a torus grid as in the previous 
section and k — \ y/ln / [alogn))]'^ ~ n/(alogn), for some a > 2. 

Using regularity, each box contains a number of nodes between a log n and h log n. We use the |)-box routing 
scheme presented in Section IIII-DI There are only a few modifications to make to the grid proof in order to obtain 
the regular geometric graph proof. The idea is to notice that for any route r = {ri,r2, - ■ ■ , r^), we can attribute 
a box route r consisting of the boxes the nodes of r belong to. If we call b{i) the box node i belongs to, then 
r — {b{ri), b{r2), ■ ■ ■ , b{ri)). We call rii the number of nodes in the box b{i) node i belongs to. The sequence of Ui 
is fixed by the graph we are considering. £ij is the Li distance between boxes b{i) and b{j): £ij = \\b{j) — b{i)\\i. 
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Fig. 8. Number of paths including an edge e = (A, B) in the road map. Paths have two hops of equal length, where equality here is defined up 
to grid effects. Therefore, for a given edge e, there are at most 8 paths including e: (1, A, B), (2, A, B), (3, A, B), (4, A, B) and (A, B, 5), 
(A,B,6), (A,BJ), {A,B,8). 

We denote by £{r) the number of nodes in route r, s{r) the starting box of route r and (i(r) its destination box. In 
our problem the chosen route is random, which we will denote by capital case letter: R, leading to other random 
variables R, i{R), s{R), etc. 

1} Evaluating 'E\W]: 

Lemma 3: (Expected E,[W] on the regular geometric graph) For any pair of nodes that do not belong to the 
same box, if their grid-distance normalized to the maximum grid-distance 5ij = lijjyfk is smaller than a constant. 



Proof: For any node i and node j that do not belong to the same box, we want to compute the expectation 
of Wij . Counting the routes in this setting is complicated because each sender has at least a log n nodes to send 
its message to. In order to use our simple analysis of the grid, we condition the expectation on the box routes R. 
Given a box route, = if i or j is not in the box route. On the contrary, if they both are in the box route, 
then Wij — \/i{R) with probability l/{ninj). Indeed, if i (or j) is in starting box, the probability that i is the 
starting node is l/n{i), because all the nodes wake up with the same rate. If i (or j) is in another box of the given 
box route, then the probability that i is chosen is l/n{i) as well, because the routing chooses next node uniformly 
among the nodes of the next box. 



then 




(19) 



More precisely, 




(20) 



E^[Er[W,,\R]] 
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From now on, we are back to a problem with routes on a grid which has k "nodes". The difference with previous 
section is that routes are no longer uniform. Indeed, now, boxes wake up more frequently if they contain more 
nodes: the probability that box hi wakes up is rii j n. Destination boxes are still chosen uniformly at random with 
probability 1/k because there are k boxes in total. Just as before, we consider only |)-box routes so that a box 
route is entirely determined by its starting box and its destination box, and we count box routes of different length 
separately. Let TZ^j be the set of box routes of size i including bi and bj. 

UiTij ^ i[r) 



2L^J+l 



1 ¥[s{R) = s{?),d{R) = djr)] 

, 2L^J+l 



E E 



niUn ^ ^ Ink 
We now use the regularity of the graph : for any node m, alogn ^ rim ^ blogn. 

nw.,] 7^'E"^^^|7^f, 

oloenr ^-^ In n •' 
4a 1 



62 n 



2 




The last inequality comes from the same computation as for the grid, and it can be reformulated as in Lemma |3] 
when using the normalized distance coefficient 5ij ~ iij /Vk. m 

2) Bounding A2(E[M^]).- 

Lemma 4 (Relaxation time RGG): 

i-x!mw]) - ^(^^)- ^21) 

Proof: As for the grid, we now apply the travel agency method. The situation is very similar to the grid case, 
except that boxes now contain 6(logn) nodes each. 

Similarly to the grid case, we will be using 2-hop paths for every pair of nodes, by adding one intermediate stop 
half-way. More precisely, this intermediate stop is chosen in the box whose coordinates on the underlying lattice 
are given by equations [16] where i and j are the lattice coordinates of the source and destination boxes. Then, 
within each box, we need to carefully and fairly assign the intermediate nodes because a flight should not be used 
more than a constant number of times (it was 8 for the grid), otherwise it would create congestion. It is not hard to 
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design such road maps because the number of nodes in each box varies at most by a constant multipUcative factor 
h/a. 

To show this, assume that each box contains exactly logrt nodes. Then, there are (logrt)^ road maps to find 
between all the nodes in a pair of boxes (assume box 1 and 3, and let box 2 be the one half-way), but happily 
enough, there are (log nf' flights between box 1 and box 2 and also between box 2 and 3. Therefore, as we can 
see on Fig. |9l the box path (box 1, box 2, box 3) can correspond to (log nf' node road maps all using different 
flights (edges). This flight allocation technique can easily be extended to cases where the boxes do not have the 
same number of airports by using some flights at most \h/a\ times each in the paths between two given boxes. 
There is a second refinement to the grid proof: solving the problem for nodes that share a common box, which do 
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Fig. 9. Path allocation when there are 3 nodes per box and thus 9 paths to design. 



not average jointly (Our bound on E[W^ij] is zero). However there are many edges to nodes in neighboring boxes. 
So formally, if node i and node j are in the same box, we design the road map from i to j to be a two hop road 
map stopping at a node located in the box above their box. By sharing fairly the available relay airports, the short 
north-south flights might be used in \b/a] extra road maps. 

We can thus construct road maps for any pair of airports that will use at most 9[5/a] times each good intensity 
flight. The rest of the proof is identical to the grid proof. 
For each path we have: 

1 1 



1 1 



< cr? \/nlogn, (22) 

for some constant c. Inequality |22] was obtained with the same reasoning as in the grid. We therefore conclude, 
using the Poincare coefficient argument that 

K < 9[— ]c\/7i logn. 
a 

As a result, for n large enough, and some constant c'. 

A2 < 1 - ^ 



d \/n log n ' 
which yields the lemma. 
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C. Regularity of random geometric graphs 

Lemma 5 (Regularity of random geometric graphs): Consider a random geometric graph with n nodes and par- 
tition the unit square in boxes of size a^^^. Then, all the boxes contain 6(logn) nodes, with high probability as 

n oo. 

Proof: Let Xi denote the number of nodes contained in the zth box. Xi are (non-independent) Binomially 
distributed random variables with expectation alogn. Standard Chernoff (we do not optimize for the constants) 
bounds [16] imply: 

P(Xi < |logn) < e-"/8i°s". 

and 

P(Xi > 2alogn) < 6""/^'°^". 

which give tight bounds on the number of nodes in each box: 

P(|logn <Xi< 2a log n) > 1 - 2e-"/^'°*5n_ (23) 

A union bound over boxes yields the uniform bounds on the maximum and minimum load of a square: 

IP(^logn < mmXi < maxX^ < 2alogn) > 1 -n^""/^— — . 
2 i i alogn 

Therefore, selecting a > 8 yields the lemma. A more technical proof shows that the lenrnia holds for a > 2. ■ 
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